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Abstract 

The forward estimation problem for stationary and ergodic time 
series {X„}5^q taking values from a finite alphabet X is to estimate 
the probability that Xn+i = x based on the observations Xi, < i < 
n without prior knowledge of the distribution of the process 
We present a simple procedure Qn which is evaluated on the data 
segment {Xq, . . . and for which, error(n) = \gn{x) — P{Xn+i = 
x\Xq, . . . , Xn)\ almost surely for a subclass of all stationary and 
ergodic time series, while for the full class the Cesaro average of the 
error tends to zero almost surely and moreover, the error tends to zero 
in probability. 

Le probleme d'estimation future d'une serie de temps ergodique et 
stationnaire {Xn}^=Q, qui prend ses valeures dans un alphabet fini X, 
est d'estimer la probabilite que Xn+i = x, connaissant les Xi pour 
< i < n mais sans connaissance prealable de la distribution du 
processus {Xi}. Nous presentons un procede simple Qn, evalue dur 
les donnees {Xq, . . . , X^), pour lequel erreur(n) = \gn{x) — P{Xn+i = 
x\Xq, . . . ,Xn) — > presque surement pour une sous-classe de toutes 
les series de temps ergodiques et stationnaires, tandis que pour la 
classe entiere la moyenne de Cesaro de I'erreur tend vers zero presque 
surement. De plus, I'erreur tend vers zero en probabilite. 
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1 Introduction 



T. Cover [6j posed two fundamental problems concerning estimation for sta- 
tionary and ergodic binary time series {Xn}'^=_oo- (Note that a stationary 
time series {X„}5^q can be extended to be a two sided stationary time series 
{^n}^=-oo-) Cover's first problem was on backward estimation. 
Problem 1 Is there an estimation scheme fn for the value P{Xi = . . . , Xq) 

such that fn depends solely on the observed data segment . . . , Xq) and 

lim . . . ,Xo) - P(Xi = l\X_n, . . . ,Xo)| = 

almost surely for all stationary and ergodic binary time series 

This problem was solved by Ornstein [20j by constructing such a scheme. 
(See also Bailey [5].) Ornstein's scheme is not a simple one and the proof of 
consistency is rather sophisticated. For an even more general much 
simpler scheme and proof of consistency were provided by Morvai, Yakowitz, 
Gyorfi [19]. (See also Algoet [1] and Weiss [21].) Note that none of thsese 
schemes are reasonable from the data consumption point of view. 

Cover's second problem was on forward estimation. 

Problem 2 Is there an estimation scheme fn for the value P{Xn+i = ^Xq, . . . , X„) 
such that fn depends solely on the data segment {Xq, . . . , X„) and 

lim |/„(Xo, . . . - P{Xn+i = l|Xo, . . . = 

n— >oo 

almost surely for all stationary and ergodic binary time series 

This problem was answered by Bailey [5J in a negative way, that is, he showed 
that there is no such scheme. (Also see Ryabko ^22j, Gyorfi, Morvai, Yakowitz 
[TT] and Weiss [21].) Bailey used the technique of cutting and stacking de- 
veloped by Ornstein [21] and Shields [23]. Ryabko's construction was based 
on a function of an infinite state Markov-chain. 

Morvai [T6] addressed a modified version of Problem 2. There one is not 
required to predict for all time instances rather he may refuse to predict 
for certain values of n. However, he is expected to predict infinitely often. 
Morvai [16] proposed a sequence of stopping times A„ and he managed to esti- 
mate the conditional probability P{Xx„+i = 1\Xq, . . . , Xx„) in the pointwise 
sense, that is, for his estimator along the proposed stopping time sequence, 
the error tends to zero as n increases, almost surely. Another estimator was 
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proposed for this modified Problem 2 by Morvai and Weiss [T7] for which the 
\n grow more slowly, but the consistency only holds for a certain subclass of 
all stationary binary time series. 

In this paper we consider the original Problem 2 but we shall impose an 
additional restriction on the possible time series. The conditional probability 
P{Xi = 1| . . . , X_i, Xq) is said to be continuous if a version of it is continuous 
with respect to metric Ylilo ^ y~i\: where x_i,y^i G {0, 1}. 

Problem 3 Is there an estimation scheme fn for the value P(X„+i = l|Xo, . . . , X„) 
such that fn depends solely on the data segment {Xq, . . . , X„) and 



almost surely for all stationary and ergodic binary time series {Xn}'^=_ao 
with continuous conditional probability P{Xi = 1| . . . , X_i, Xq) ? 

We will answe this question in the affirmative. This class includes all k- 
step Markov chains. It is not known if the schemes proposed by Bailey [5], 
Ornstein [20], Morvai, Yakowitz, Gyorfi [12] solve Problem 3 or not. 

Problem 4 Is there an estimation scheme fn for the value P{Xn+i = l|Xo, . . . , X„) 
such that fn depends solely on the data segment {Xq, . . . , X„) and 



almost surely for all stationary and ergodic binary time series {Xn}'^=_oo? 
Bailey [S] (cf. Algoet [2] also) showed that any scheme that solves Problem 
1 can be easily modified to solve Problem 4 (indeed, just exchange the data 
segment (X_„, ...,Xo) for (Xo,...,X„), but apparently not all solutions 
of Problem 4 arise in this fashion. For further reading cf. Algoet pLj, 
Morvai, Yakowitz, Gyorfi [TH], Gyorfi et. al. [S], Gyorfi, Lugosi and Morvai 
[TU] , Gyorfi and Lugosi and Weiss [21] . 

Problem 5 Is there an estimation scheme fn for the value P(X„+i = l|Xo, . . . , X„) 
such that fn depends solely on the data segment {Xq, . . . ,X„) and for arbi- 
trary e > 0, 



lim |/„(Xo, ...,Xn)- P{Xn+l = 1\Xq, . . . ,X„)| = 



■oo 




hm P(|/,(Xo,...,X,)-P(X, 



j+i — 



l|Xo,...,X,)|>e) = 0. 



■oo 



for all stationary and ergodic binary time series {X„} 



oo ^ 
ra=— oo ■ 
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By stationarity, for any scheme that solves Problem 1, the shifted version 
of it solves Problem 5. (Just replace the data segment (X_„,...,Xo) by 
{Xq, . . . , Xn).) 

There are existing schemes that solve Problem 4 (e.g. Bailey [5], Ornstein 
|20j . and even for a more general case Morvai, Yakowitz, Gyorfi [19], Algoet 
p], Gyorfi and Lugosi [9]) and there are schemes that solve Probkem 5 (e.g. 
Bailey [Sj, Ornstein [20] and for even more general case Morvai, Yakowitz, 
Gyorfi [TH], Algoet [T], Morvai, Yakowitz and Algoet [IB])- In this paper 
we propose a reasonable, very simple algorithm that simultanously solves 
Problem 3, 4 and 5. Note that the schemes given by Bailey [S], Ornstein [20] , 
Morvai, Yakowitz, Gyorfi [T9], Algoet [1] and Weiss [21] are not reasonable 
at all, they consume data extremely rapidly, cf. Morvai [15] and it is not 
known if their schemes solve Problem 3 or not. 

2 Preliminaries and Main Results 

Let {Xn}'^=_oo be a stationary time series taking values from a finite alphabet 
X . (Note that all stationary time series {X„}5^q can be thought to be a two 
sided time series, that is, {Xn\'^=_oo- ) For notational convenience, let 
X^ = (Xm, . . . ,X„), where m < n. Note that if m > n then is the 
empty string. 

Let g : X ^ {—oo, oo) be arbitrary. 

Our goal is to estimate the conditional expectation _E((7(X„+i)|Xq ) from 
samples Xq . 

For k >1 define the stopping times (ra) which indicate where the fc-block 
X^_^,^^ occurs previously in the time series {X„}. Formally we set Tq (n) = 
and for i > 1 let 

(n) = min{t > rl,{n) : X„"4^,_, = X„"_,^J. (1) 

Let Kn > 1 and Jn > 1 be sequences of nondecreasing positive integers 
tending to oo which will be fixed later. 

Define k„ as the largest 1 < k < Kn such that there are at least Jn occur- 
rences of the block X^^/^^-^ in the data segment Xq , that is, 

Kn = max{l <k< Kn ■■ Tj^in) < n - k + 1} (2) 

if there is such k and otherwise. 
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Define A„ as the number of occurences of the block X"_^^_,_^ in the data 
segment Xq , that is, 

A„ = max{l < j : rj"" < n - k„ + 1} (3) 

if K„ > and zero otherwise. Observe that if k„ > then Jn- 
Our estimate gn for E{g{Xn+i)\XQ) is defined as go = and for n > 1, 

9n = (^n-rf " (n)+l ) (4) 

if > and zero otherwise. 

Let X*~ be the set of all one-sided sequences, that is, 

X*^ = {(..., X-i,Xo) : Xi E X for all — oo < i < 0}. 

Define the function G : X*~ {—oo, oo) as 

G{x'_J=E{g{X,)\Xl^ = x'_J. 

Note that as a conditional expectation this is only defined almost surely. E.g. 
if gix) = for a fixed z G X then Giy^^J = P{X^ = z\X^_^ = y'LJ. 

Define a distance on X*~ as 

oo 

rf*(^°oo,l/°oo) = E2~^"'l{-«^^-a- 

Definition The conditional expectation G{X^^) is said to be continuous 
if a version of it is continuous on the set X*~ with respect to metric d* {■,■). 
Since this space is compact, in fact, continuity is equivalent to uniform con- 
tinuity. 

The processes with continuous conditional expectation are essentially the 
Random Markov Processes of Kalikow [12], or the continuous g-measures 
studied by Mike Keane p3] . 

Theorem Let {X„} he a stationary and ergodic time series taking values 
from a finite alphabet X. Assume Kn = max(l, [0.1 log|_:f| nj ) and Jn = 
max(l, Then 
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(A) if the conditional expectation G{X^^) is continuous with respect to met- 
ric d*{-, ■) then 

lim \gn — -E(5'(X„+i)|Xq )| = almost surely, 

n— >oo 

(B) without any continuity assumption, 

^ n— 1 

lim — > \gi ~ E{g{Xi^i)\XQ)\ = almost surely, 

i=0 

(C) without any continuity assumption, for arbitrary e > 0, 

lim P{\gn - E{giXn+i)\X^)\ > e) = 0. 

Remarks: 

Note that these results are valid vithout the ergodic assumption. One may 
use the ergodic decomposition throughout the proofs, cf. Gray [7] p. 268. 

We note that from the proof of Ryabko [22j and Gyorfi, Morvai, Yakowitz [TT] 
it is clear that the continuity condition in the first part of the Theorem can 
not be relaxed. Even for the class of all stationary and ergodic binary time- 
series with merely almost surely continuous conditional probability P{Xi = 
1| . . . , Xq) one can not solve Problem 2 in the Introduction. (An almost 
surely continuous conditional probability is such that as a function restricted 
to a set C with full measure, it is continuous on C. ) 

We do not know if the shifted version of our proposed scheme gn solves 
Problem 1 or not. (That is, in the case when gn is evaluated on (A_„, . . . , Xq) 
rather than on (Aq, . . . , A„). 

If is a countably infinite alphabet then there is no scheme that could 
achieve similar result to part (A) in the Theorem for all bounded g{-), even 
if you assume that the resulting G{-) is continuous, and the time series is 
in fact a first order Markov chain. Indeed, whenever a new state appears 
which has not occured before, you are unable to predict, cf. Gyorfi, Morvai, 
Yakowitz [TT] . 
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3 Auxiliary Results 



For A; > 1, n > and J > it will be useful to define auxiliary processes 
{Xf '"'^'^},^_^ as follows. Let 

j^{Kn,j) ^ x^^^k^^^^i for -oo < z < oo. (5) 

For an arbitrary stationary time series {Yn} for A; > 1 let Tq (y^) = and 
for i > 1 define 

rtiY^J = min{t > fUY^J : Yl,^,^, = Y%,}. (6) 

If it is obvious on which time series t^{Y^) is evaluated, we will write f^. 
Let T denote the left shift, that is, (Tx°°^)i = Xj+i. 

We will need the next lemmas for later use. 

Lemma 1 Let be a stationary time series taking values from a finite 

alphabet X. For k > 1, n > and j > 0, the time series has 
the same distribution as {Xi}'^_^. 

Proof Note that by (HD, and (ED, 

T'^-'ixZ^t^ = r^{n) = s} = {X^ = xT. = s} 
where fj is evaluated on time series {Xi}°l_^. Now by (JSj) and stationarity, 

oo 

p(#-) = .,...,xc^-) = xj = E^«3wr = -r,rfH = .) 

oo 

= p{xr = xT) 

and the proof of Lemma [T] is complete. 

Lemma 2 Let {Xn} be a stationary and ergodic time series taking values 
from a finite alphabet X . Assume Kn — ^ oo, J„ ^ oo and lim„^oo ^ — ^■ 
Then 

lim tin = oo almost surely. 
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Proof We argue by contradiction. Suppose, that — K, X^^_j^ — x^j^ 
for a subsequence n^. Then a simple frequency count (in the data segment 
Xq' there are less than J„. occurrences of block x^j^) yields that 



_ ^0 ^ 
K — -^-Kj 



< hm ^ = 0. 

n^oo n 



The set of sequences that contain a block with zero probability has zero 
probability and thus Lemma [2] is proved. 

4 Pointwise Consistency 



Proof of Theorem (A). By Lemma [2], for large n, 
|^?„(a:)-E((7(X„+i)l^o)l 



< max max 

J=Jn,...,n k=l,...,K„ 



)-GiX_^^' ')] 



+ 



J2g{x 



n-Tj" (n). 



E{g{X^^,)\X^[ 



Concerning the first term, by ([T]), ([0]) and (jS]), 



1 ^ 



n-rf (n) 



)] 



7 x^fi:f\ ^S'"'"^)] 

J — ; j j 3 



j=0 



where is evaluated on {X^'''"''^^}'^_^. Since by Lemma [T] 



G{. . . = E{g{x^l'-';")\ . . . 



{k,n,J)^ 



the pair (F, = g{x'fi'^^) - G{. . . , x'^l'^f, X^t'"'^)), ^, = a{xZ 



'{k,n,J) -t^(fc,n,J)\ 



(7) 



forms a 



martingale difference sequence ( E(Tj\J-'j) = and Tj is measurable with 
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respect to J^j+i) for which Azuma's exponential bound (cf. Azuma [1]) yields 



j 



> e < 2e^ 



for any B such that maxj.g;^' \g{x)\ < B. Now by 



P I max max 

J=J„,...,n k=l,...,Kn 



J 



oo y 



> e < nK„2e b 



and by assumption nKn2e '^^■^"/^ sums up and the Borel-Cantelli Lemma 
yields almost sure convergence to zero. Concerning the second term, 



f J2iGiXrS^''^) - E [G{X1J\X-)] 



A 



almost surely 



^n-r"" (n) 



since tends to infinity by Lemma Ej = X^_^^^^ for < 

3 < A„, and the conditional expectation G{-) is in fact uniformly continuous 
on X*^ with respect to d*{-, ■). The proof of Theorem (A) is complete. 



5 Time Average Performance 

If the process does not have continuous conditional expectations then the last 
step in the proof of Theorem (A) is not valid. It can be carried out for most 
time instances n by using the typical behaviour of almost every realization 
x°^^. More specifically, for every 5 > 0, the probability of the set of those 
x^__^ for which 

\E{g{X,\X\^, = X\^,) - < 5 for all A: > 

tends to one as K tends to infinity. The typical behaviour we are after is the 
statement that most of the times t = n — T^"[n) the sequence T^x^_^ belongs 
to the above mentioned set. While this need not be the case for all n, it is 
true for most ra's and the next lemma makes this precise. For the analysis 
we will fix a value of «;„ at k. 
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Define the set of good indexes M„((5, K) C {K — — l}as 

= {K-l<i<n-l : \E{g{X,+,)\Xl_^^^)-G{Xl^)\<5 for all > iT}. 

Wc will analyze the behaviour of our algorithm for Kn = k for each i < n by 
first dividing up the indices {1, 2, . . . , n} according to the value of Xl_f,_^-^ — 
U-k+iy considering what happens for each of these. 
Let e X''. Define the set of indexes l'^{y\j^]) C {/c — 1, . . . , n — 1}, 

where you can find the pattern y^^+i, that is, 

I'n{y-k+i) = {k-l<i<n-l : XU^, = y\^,}. 
Define Dk{i) as 



{T/(i) : T/(i) < i - /c + 1 and 1 < j < i + 1} if r).(i) <i-k + l 
otherwise. 



Let E'^{5, K) be defined as 

^^(5, K) = {Q<i<n-l: \Du{i)^M^{5, K)\>{1- 5^-'')\Du{i)\.} 

If the number of occurrences of y^fc+i prior to i was not enough for our 
algorithm then Dk{i) will be empty. This is rare, and can be expressed as 
follows: Let 

Fn' = {0<^<n-l: L'fe(i) = 0.} 

It is immediate that 

l^nl < I'^l'-^n- (8) 

Lemma 3 Assume \Mn{5,K)\ > (1 — 5)n. Then 

|{0 < Z < n - 1 : li E'^{5, K) and I ^ F^}\ < 
Proof Fix 6, K, k and x G X. Temporarily fix also y^^+i 

e X''. Let 

z = \In{y-k+i)\ k < ii < 12 < ■ ■ ■ < iz denote the elements of 

IniV-k+i)- Let ij{y\_^i) be the largest element iji of In{y\+i) such that 
Dk{ij,) ^ and 

|{0 < / < n - 1 : / e Dkiij,) and / ^ M^{5,K)}\ > 6°-^\Dk{ij,)\. 
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Define S to be the set of tliese indexes as y'^k+i varies over all element X'^. 
It is clear that if i,j & S, i j then Dk{i) f] Dk{j) = since different blocks 
V-k+i involved. It follows from the construction that {Dk{i)}i(zs is a 
disjoint cover of {0 < / < n - 1 : I ^ E^{6,K) and I ^ F!^]. It follows that 

n5 > |{0</<r2-l : /^M„(5,ir)}| 

> - • ^ e Dfc(i) and / ^ M„((5,/s:)}| 

ies 

> 6'■'J2\Dki^)\ = S'■'\[jDki^)\■ 



i€S 



Now 



|{0 < / < n - 1 : E^{6,K) and / ^ F^}\ < \[j Dkii)\ < 5°-^n 



and the proof of Lemma [3] is complete. 
Proof of Theorem (B). Consider 

rt— 1 

n ^ 



i=0 



< 



\0-EigiX^)\Xo)\ 



n 



n-l 



^ n—1 

+ -E 



max 

n ^ — ^ J=Ji,...,: 

4 = 1 



n-l 



+ -E 



n-l 



A, 



+ i5^|i?((7(X,+i)I^.U+i)-^(^7(^m)l^o)| 

i=l 



The first term tends to zero. The second term tends to zero since by ([8]) 
\F^-\/n<\X\^^^Jjn^O. 
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Concerning the third term, by ([7]) and by Azuma's exponential bound (cf. 
Azuma 



J=J^ 



J 



{K^,i,J)^ 



-G{. 



T. * — 1 r. ' 



)} 



> e < 226" 



(where i? is any real such that 2maXrc^x \g{x) \ < B) 
is summable, hence the Borel-Cantelli Lemma yields 
to zero. By Toeplitz lemma the average also convert 
Now we deal with the fourth term. Let < e < 
the integer d large enough such that | < 
and A^o be so large that > (1-5) for all 

such K and Nq since by the ergodic theorem and the 
theorem limfc^oo hnin^oo = 1 almost surely.) 

large that A:„ - + 2 > AT and \X\^mr.~-d+i) > 
n > Ni. The sum 



and the right hand side 
almost sure convergence 
;es to zero. 

1 be arbitrary. Choose 
e. Let 6 = Let K 
n > Nq. (There exist 
martingale convergence 
Now let A^i > A'o be so 
for all n > Ni. Assume 



^ n— 1 

-E 

n ^ 



rEG(A- 



E{giX,^^\Xl 



that we are trying to estimate will be divided into blocks according to the 
value of Ki. In fact only values in the range [Kn—d+2, need be considered 
since the sum up to can be estimated by |A'|^°^-^"~'^"'"^)2 maxj,g;(:' \g{ri)\ 

and so by our assumption on d, after dividing by n this will be at most 
e2maxj,g;t' \g{y)\- For i in the range \X\^^^) for A:„ - 5 + 2 < A; < 

Kn, and = K^, if i G -E|^iofc (5, K) then we get for more than {l — ^/5)\D 



terms an upper bound of 6 while for the rest we may use 2maxyg;f \g{y)\- 
This gives an upper bound of 

6\Dk{i)\ + V6\Dk{i)\2m8iXy^p, \g{y)\ 
\Dki^)\ 

Using Lemma[3]we can estimate the sum over all i in the interval ^^^''-^^ , | A" | ^o^^) 
by 

n{6 + V62 max |(7(y)|) + V^n2max Ify'd/)]. 
Dividing by n, we have an upper bound: 

S + V62 max \g{y) \ + "\/52max \g{y)\- 

yeX y&X 
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The same argument yields the same upper bound for the i's in the range 
[lA'lio^",^). 

Summing over k in the range [Kn — d + 2, Kn + 1] yields an upper bound: 
d5 + d\f52 max \g{y) \ + d\f52max. \g{y)\. 

Recall that \f5d = and this yields an upper bound: 
e + y/e2max\g{y) \ + y/e2 max \g{y)\. 

y&X y&X 

Since e was arbitrary, the fourth term tends to zero. 

Now we deal with the last term. Since by the martingale convergence theo- 
rem, E{g{Xi)\X^^) G{X^^) almost surely, thus 

hm \E{g{X,)\X'_^^^^,) - E{g{X,)\Xt:)\ = 

and applying Breiman's generalized ergodic theorem, cf. Maker [14J (or Al- 
goet [2]), 

^ n— 1 

hm -J2\E{g{X.+i)\XU^^,)-E{g{X,^,)\X^)\=0 
almsost surely and the proof of Theorem (B) is complete. 



6 Weak Consistency 

Proof of Theorem (C). 

In order to show that for all ergodic stationary processes our estimate gn 
converges in probability we follow the steps in the proof of Theorem (A). 
The probability that 

(|^7„(x)-E(^7(X„+i)|Xo")| >3e) 
can be estimated as the sum of the probability of several sets. 



P 



max max 

J=J„,...,n k=l,...,Kn 



1 ^ 

-J X^[fl'(-^n-Tj'(n)+l) - G{X. 



n—T^ in) 



)] 



> e 
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and 



P(Kn < Kn) 



P 



An 



n+l, 



For the first, the argument given there suffices. Concerning the second, it 
tends to zero by Lemma H] in the Appendix. (Apply it with A = {X^_j^^_^^ = 



X, 



n-Kn+l 



}, D = Jn- Then sum over all possible 



n-K„+l 



to get that this 

second probabilty in question is not greater than \X\^"-Jn/n which tends to 
zero.) For the third, it is easy to see that it tends to zero by stationarity and 
by the martingale convergence theorem which implies that 



hm Pi\E{g{X,)\Xlj - Eig{X,)\X' 



+1) 



0. 



We concentrate on the last probability. Recall the notations from the proof 
of Theorem (B). The main thing is to show that with probability at least 
1 — e, for n sufficiently large, most of the elements I G I^"{X^j^^_^_j) are such 
that T'-x^^ does not belong to the set 



r ( 
\X 



oo 
oo 



\E{g{X{)\X' 



k+l 



a;°_fc+i)-G'(x°^)| > e for some k > K^} 



as neither does T"'x'^^ itself. By the martingale convergence theorem, the 
probability of the set M„(e) tends to zero as n tends to infinity. Let n be so 
large that this probability in question is less than Let 



X, 



n ^ 
n-Kn+l> 



X 



The probability of will be evaluated using the ergodic theorem along 
the orbit of a typical point. Let x^^ be such a typical orbit and be a 
very large number. Fix y^_j^^j^i, and note those elements in I^"{y^x^^_^_i) 
that belong to We will cover them with disjoint blocks of length Kn, 
begining on the right end A^ — 1 in the obvious way. These sets (subsets 
of iNiV- Kn+i) ) '^^^^ ^r{y-K„+i) where r — 1,2, Formally, let ■ ■ ■ < 
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l2 < h denote the elements of I^''{y^_K„+i)- Let Co(?/°;^^_,_i) — 0. For 
r > 1 we define Cr{y^x +i) recursively. Let / be the largest index such that 
l>n, I Ur/<r Crfiy^K„+i) and x^^ e T-"+'5„. If there is such / then set 
Criy'.Kr.+i) = {l-n + kn-l<k<l for ^ = 1, 2, . . . }. Let be 
the largest r for which Cr{y^_K^_^i) is defined. Let 

lN{Mn{e)) = {0 < / < iV - 1 : T'x"^^ G M.(e)}. 

Then by the construction of C'r(z/°K„+i)> eSkoh. 1 < r < -R(^/°x„+i)) 

|{^ e : T'x-^ e M„(e)}| > e|a(y°K„+i)l- 

Since x!"^^ is typical, for large N, \Ii^{Mn{e))\ < e^N and 



> ^ E E 

Let 

/Ar(S„) = {n < i < TV - 1 : T'-'^x^^ G BJ. 

But those n <l < N — 1, such that T^'^x'^^ e S„ are covered by this union 
- thus 

e\lNiBn)\<e^N 

and thus 

P{B^) = hm MM < , 
since x'^^ was typical. The proof of the Theorem is complete. 



7 Appendix 

Lemma 4 Let {Xn} be stationary and ergodic. For an arbitrary set A mea- 
surable with respect to (7(Xq), the probability of the event 

n-l 

A(n, D) = e A : J] Ia{T'x^J < D) 



1=0 



is not greater than D/n. 
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Proof Fix a typical orbit x°^^. Let 

/jv(i(n, D)) = {n<l<N -I: T'x°°^ G D)}. 

We make a disjoint cover. Let . . . ,l2 < h denote the elements of I]\f{A{n, D)). 
Set E'r = and for r = 1, 2, . . . , define Ej. recursively. Let / denote the largest 
element of Iiy{A{n, D)) such that / ^ IJr/<r ^rr if there is such and let 

Er = {I -n <li <l : for z = 1, 2, . . . .} 

Now let R denote the largest r for which E^ has been defined. Since the 
cover is disjoint, R{n + 1) < A^. Then clearly, 

/jv(i(n,D))^ RD ^ D 



N - R{n+1) - (n + l) 

and the left hand side tends to P{A{n, D)). The proof of Lemma H] is com- 
plete. 
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